Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move trainer.step() before metric.update to overlap between backward pass and allreduce #1609

Merged
merged 1 commit into from
Feb 10, 2021

Conversation

karan6181
Copy link
Contributor

  • The metric.update() is a sync call and it was preventing the overlap between communication and computation since it was present between loss.backward() and trainer.step().
  • Moved the metric.update() after trainer.step() to overlap communication and computation. Hence, the correct order is forward pass -> calculate loss -> backward pass -> allreduce -> metric update.
  • Also fixed minor error in the train_mask_rcnn.py script.

@github-actions
Copy link

Job PR-1609-a88e896 is done.
Docs are uploaded to http://gluon-vision-staging.s3-website-us-west-2.amazonaws.com/PR-1609/a88e896/index.html

@zhreshold zhreshold merged commit 1ff5446 into dmlc:master Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants